136 research outputs found
Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm
In this work, we study the 1-bit convolutional neural networks (CNNs), of
which both the weights and activations are binary. While being efficient, the
classification accuracy of the current 1-bit CNNs is much worse compared to
their counterpart real-valued CNN models on the large-scale dataset, like
ImageNet. To minimize the performance gap between the 1-bit and real-valued CNN
models, we propose a novel model, dubbed Bi-Real net, which connects the real
activations (after the 1-bit convolution and/or BatchNorm layer, before the
sign function) to activations of the consecutive block, through an identity
shortcut. Consequently, compared to the standard 1-bit CNN, the
representational capability of the Bi-Real net is significantly enhanced and
the additional cost on computation is negligible. Moreover, we develop a
specific training algorithm including three technical novelties for 1- bit
CNNs. Firstly, we derive a tight approximation to the derivative of the
non-differentiable sign function with respect to activation. Secondly, we
propose a magnitude-aware gradient with respect to the weight for updating the
weight parameters. Thirdly, we pre-train the real-valued CNN model with a clip
function, rather than the ReLU function, to better initialize the Bi-Real net.
Experiments on ImageNet show that the Bi-Real net with the proposed training
algorithm achieves 56.4% and 62.2% top-1 accuracy with 18 layers and 34 layers,
respectively. Compared to the state-of-the-arts (e.g., XNOR Net), Bi-Real net
achieves up to 10% higher top-1 accuracy with more memory saving and lower
computational cost. Keywords: binary neural network, 1-bit CNNs,
1-layer-per-blockComment: Accepted to European Conference on Computer Vision (ECCV) 2018. Code
is available on: https://github.com/liuzechun/Bi-Real-ne
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
The study explores the effectiveness of the Chain-of-Thought approach, known
for its proficiency in language tasks by breaking them down into sub-tasks and
intermediate steps, in improving vision-language tasks that demand
sophisticated perception and reasoning. We present the "Description then
Decision" strategy, which is inspired by how humans process signals. This
strategy significantly improves probing task performance by 50%, establishing
the groundwork for future research on reasoning paradigms in complex
vision-language tasks
GANHead: Towards Generative Animatable Neural Head Avatars
To bring digital avatars into people's lives, it is highly demanded to
efficiently generate complete, realistic, and animatable head avatars. This
task is challenging, and it is difficult for existing methods to satisfy all
the requirements at once. To achieve these goals, we propose GANHead
(Generative Animatable Neural Head Avatar), a novel generative head model that
takes advantages of both the fine-grained control over the explicit expression
parameters and the realistic rendering results of implicit representations.
Specifically, GANHead represents coarse geometry, fine-gained details and
texture via three networks in canonical space to obtain the ability to generate
complete and realistic head avatars. To achieve flexible animation, we define
the deformation filed by standard linear blend skinning (LBS), with the learned
continuous pose and expression bases and LBS weights. This allows the avatars
to be directly animated by FLAME parameters and generalize well to unseen poses
and expressions. Compared to state-of-the-art (SOTA) methods, GANHead achieves
superior performance on head avatar generation and raw scan fitting.Comment: Camera-ready for CVPR 2023. Project page:
https://wsj-sjtu.github.io/GANHead
HyperStyle3D: Text-Guided 3D Portrait Stylization via Hypernetworks
Portrait stylization is a long-standing task enabling extensive applications.
Although 2D-based methods have made great progress in recent years, real-world
applications such as metaverse and games often demand 3D content. On the other
hand, the requirement of 3D data, which is costly to acquire, significantly
impedes the development of 3D portrait stylization methods. In this paper,
inspired by the success of 3D-aware GANs that bridge 2D and 3D domains with 3D
fields as the intermediate representation for rendering 2D images, we propose a
novel method, dubbed HyperStyle3D, based on 3D-aware GANs for 3D portrait
stylization. At the core of our method is a hyper-network learned to manipulate
the parameters of the generator in a single forward pass. It not only offers a
strong capacity to handle multiple styles with a single model, but also enables
flexible fine-grained stylization that affects only texture, shape, or local
part of the portrait. While the use of 3D-aware GANs bypasses the requirement
of 3D data, we further alleviate the necessity of style images with the CLIP
model being the stylization guidance. We conduct an extensive set of
experiments across the style, attribute, and shape, and meanwhile, measure the
3D consistency. These experiments demonstrate the superior capability of our
HyperStyle3D model in rendering 3D-consistent images in diverse styles,
deforming the face shape, and editing various attributes
- …